テクニカルレポート | GRACEセンター

(1)

ISSN 1884-0760

GRACE TECHNICAL REPORTS

An Order-Sensitive Fusion for XQuery

Hiroyuki Kato

Soichiro Hidaka

Zhenjiang Hu

Keisuke Nakano

Yasunori Ishihara

GRACE-TR 2009–04 September 2009

CENTER FOR GLOBAL RESEARCH IN

ADVANCED SOFTWARE SCIENCE AND ENGINEERING NATIONAL INSTITUTE OF INFORMATICS

(2)

(3)

An Order-Sensitive Fusion for XQuery

Hiroyuki Kato Soichiro Hidaka Zhenjiang Hu National Institute of Informatics

2-1-2 Hitotsubashi, Chiyoda-ku Tokyo 101-8430, Japan {kato,hidaka,hu}@nii.ac.jp

Keisuke Nakano

The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi

Tokyo 182-8585, Japan [email protected]

Yasunori Ishihara Osaka University 1-5 Yamadagaoka, Suita-shi

Osaka 565-0871, Japan [email protected]

September 2, 2009

Abstract

In XQuery, composite expressions with node creation are typical in prac-tice, for example, in data integration systems for XML with XQuery as schema mapping in addition to the classical view resolution. We propose a fusion algorithm for this kind of composite XQuery expressions. In devel-oping the XQuery fusion, there is a problem that naive elimination of node creations does not preserve the semantics of XQuery with constraints of the order of nodes. We solve this problem by introducing an adornment code called extended Dewey’s assigned to the occurrences of expressions. In this paper, we show that XML fragments created dynamically as intermediate re-sults in a store can be emulated statically in such a way that rewriting XQuery expressions to avoid redundant parts is enabled using the extended Dewey’s. The experimental results show that under a multi-step schema mapping sce-nario, our prototype system successfully eliminates execution cost of redun-dant node creations produced as intermediate results.

1 Introduction

(4)

in a tree. This order plays an important role in the semantics of XQuery, especially in node creations and axis accesses. An XQuery expression is evaluated against an XML store which contains XML fragments with their document order. This store contains the fragments that are created as intermediate results, in addition to initial XML documents Hidders et al. [2004]. A node creation by an element constructor generates a new node which is placed at an arbitrary position in document order between the already existing trees. An axis access by a step expression returns its result in document order and without duplicates.

Rewriting composite expressions based on eliminating intermediate results generated by redundant expressions is a traditional optimization technique (known as fusion) Wadler [1988], Chin [1992], Fegaras and Maier [2000] in both pro-gramming languages community and database community. In XQuery, composite expressions for node creation are typical in practice, for example, in data inte-gration systems for XML with XQuery as schema mapping Tatarinov and Halevy [2004]. We propose, in this paper, a fusion algorithm for this kind of composite XQuery.

In developing the XQuery fusion, there is a problem that naive elimination of node creations does not preserve the semantics of XQuery with constraints of the order of nodes. The XQuery fusion is more difficult than the existing fusion Wadler [1988], because naive elimination of node creations does not preserve document or-der. For example, it is incorrect to transformt($v/c,$v/a)/t/cinto$v/c. For an arbitrary store — assuming identical bindings of the externally defined variable $v— both expressions always return a value equivalent data, in the sense that they produce the same results when they are serialized and output by the query proces-sor as afinal result. However, as intermediate results in a query processor, two data evaluated by these expressions populate in different document order. When$v/c

does not result in an empty sequence1_{, the nodes produced by the former populate} in the new document order created by the element constructort($v/c,$v/a)/t, whereas the nodes returned by the latter populate in the document order existing in the input store. Consequently, if we take a further step along parent axis for both expressions, namely, t($v/c,$v/a)/t/c/.. and $v/c/.., now it is easy to see the differences since the former results in a node created by thetelement, whereas the latter results in a sequence of nodes bound to$v. Therefore, eliminating redun-dant expressions including node construction and preserving document order are con_ﬂicting requirements. The purpose of our work is to meet these two con_ﬂicting requirements.

Although expressions that contain element constructors are non-deterministic with respect to document orderPage et al. [2005], we notice two properties that (1) a node generated by an element constructor is placed at the _ﬁrst position of the document order de_ﬁned by the element constructor, (2) nodes in a sequence gener-ated by expressions occurring inside the element constructor are copied deeply and

1_{To simplify the discussion, we do not consider in this paper, the case that}_$_v/c_{results in an}

(5)

document order

a

b

c d e

f g

(1) (2)

A

B

<a> 

<c/><d/>  <e>

<f/><g/> </e> </a>

C

x x+1 x+2 x+3 x+4 x+5 x+6

Figure 1: Node creation in the document order

placed following the node in (1) above with preserving the order in the sequence. These properties enable us to emulate newly created document order, statically.

In this paper, we propose deforestation techniques for XQuery. The kind of queries on which those techniques apply are common in practice due to the use of XQuery both as a language for producing XML views from legacy sources and to query XML such as XML data integration systems with XQuery as schema map-ping. In addition, constructing trees in XQuery is expensive due to the complexity of the XML data model, making the proposed optimization particularly signi_ﬁ -cant. The fundamental idea is to support static-time rewritings for queries which perform XML navigation over newly constructed trees. Preserving the semantics of the original code is particularly challenging in the XQuery context due to the importance of node identify and document order in the XML data model and in XPath. To address those issues, we show that XML fragments created dynami-cally as intermediate results in a store can be emulated statidynami-cally in such a way that rewriting XQuery to avoid redundant expressions is enabled. This emulation is achieved by using an adornment code called the extended Dewey’s assigned to the occurrences of expressions. The Dewey encoding has been used in index structure for XML documents Lu et al. [2005], Tatarinov et al. [2002]. We have extended the Dewey code to be suitable for the semantics of XQuery, especially for “for” expressions. Note that no schema information is required in doing this rewriting.

Our main contributions can be summarized as follows.

(6)

expressions.

• By using this static emulation, we propose an XQuery fusion so that un-necessary element constructions are avoided while preserving the document order in XML.

• We show actual evaluation numbers by experimental results using a proto-type system to give an understanding of the performance bene_ﬁts.

This paper is organized as follows. After explaining our static emulation of store in Section 2, we show how fusion transformation can be correctly performed by partial evaluation of expression based on three fusion rules in Section 3. Some properties on the extended Dewey code are described in Section 4. The revised version of our fusion which handles failure cases is described in Section 5. Our experimental results are shown in Section6. We conclude the paper in Section 7.

Related work There are several studies on rewriting XQuery into XQueryGueni et al. [2008], Page et al. [2005], Koch [2005], Tatarinov and Halevy [2004]. In these, the most related is Gueni et al. [2008] in a sense of eliminating redun-dant expressions. In Gueni et al. [2008], the authors have proposed a rewriting optimization that replace the expressions, which return empty sequences, with () by the emptiness detection based on static analysis. Compared with this, our rewriting is to eliminate redundant element constructors as well as to detect empti-ness. KochKoch [2005] and Page et al.Page et al. [2005] introduced some classes for composite XQuery and proposed XQuery-to-XQuery transformations over the classes of XQuery they de_ﬁned. Their target queries don’t contain newly con-structed nodes. In real world, however, practical expressions such as schema map-ping always returns newly constructed elements. Tatarinov and Halevy proposed an ef_ﬁcient query reformulation in data integration systems, in which XML and XQuery are used for data model and schema mapping, respectively Tatarinov and Halevy [2004]. In this system, composition of element construction is typical be-cause the schema mapping that maps some element to other element involves ele-ment construction. They treat actual reformulation algorithm as a black box. Our work attempts to open the box and exploit some properties in this box.

2 Static Emulation of Store

This section describes how to achieve a static emulation of XML store. First, we introduce a notion of simple XML store using Dewey code and its order in Sec-tion 2.1. Then, we explain our static emulaSec-tion of store by using the extended Dewey code and its order.

(7)

store. An element constructor that is depicted in the upper center part of the_ﬁgure produces tree structure just below the expression (B) within which nodes are given order in one-dimensional document order axis. For example, if the topmost node named “a” is given order x, then its_ﬁrst child node named “b” is given order that is strictly greater than x, say, x+1, which is also strictly less than the order given to its children named “c” and “d”. These ordering is guaranteed to be consistent between elements created in a common element constructor.

On the other hand, order between nodes that are separately created by differ-ent elemdiffer-ent constructors in a query is implemdiffer-entation dependdiffer-ent. For example, consider the following expression(Q1)in XQuery.

Q 1. (hi//h,jk//j)

In this query, the document order between the tree with root node named “h” and the one with root node named “j” is implementation dependent. So, no one can decide the order of these two nodes by static analysis. In addition, the document order between the existing nodes – like A and C in the_figure – and a newly created node is also implementation dependent, thus static analysis can not decide this order either. However, overlap along document order axis never happens between these nodes. Extended Dewey order de_fined in this section is designed to respect all these properties, namely, (a) no order is prede_fined statically across nodes that are separately created in different element constructors in a query, (b) preorder is de_fined between nodes inside an element constructor, (c) orders given to elements that belong to different roots of trees are pair-wise disjoint.

XML store is used in the semantics of XQuery Hidders et al. [2004]2 while our algorithm is based on a static analysis. In this section we show that a static emulation of XML store can be achieved by using an extended Dewey order, which preserves the document order in terms of expressions.

2.1 Simple XML Store using Dewey Order

Dewey Order encoding of XML nodes is a lossless representation of a position in document order Lu et al. [2005], Tatarinov et al. [2002]. In Dewey Order, each node is represented by a path from a root using “.”, which is depicted by D in Figure 2: (1) a root is encoded by r ∈ S whereS is a countably in_ﬁnite set of special codes; (2) when a nodeais then-th child of a nodeb, the Dewey code of

a, did(a), is did(b).n. Note thatǫin Figure 2 is used for a termination, so every Dewey code ends withǫ.

Using Dewey encoding, sorting and duplicate elimination in document order can be achieved by a straightforward way. Now, simple XML store, in which nodes are restricted to element nodes — other nodes such as attributes are disregarded here — is de_ﬁned by an ordered tree representation using Dewey codes instead of

2_{The semantics of XQuery is formally given by World Wide Web Consortium [2007]. However,}

(8)

D ::= rX r∈ S,Sis a set of special codes.

X ::= ǫ|.B

B ::= nX n∈ I,Iis a set of integers. Figure 2: Pure Dewey code

<s> <a> </a> <c/> <c/> </s> source.xml s c r0

r0.2

a

b b

r0.1

r0.1.1 r0.1.2

c r0.3

initial store SSt

0

SSt

2= SSt0㺥SSt1

output store serialized XML <t> <c/> <c/> <a> </a> </t> SSt 1 t a b b c c r1

r1.1 r1.2

r1.3.1 r1.3.2

r1.3

Figure 3: An input document source.xml,SSt0and output store by(Q2).

nodes and edges in Hidders et al. [2004]. We assume a set of namesN used for element names and a countably in_ﬁnite set of Dewey CodeD, which is depicted by D in Figure 2. Both the strict partial order < and the equality = on D are straightforward.

DEFINITION 2.1 (Simple XML Store). A simple XML store is a 3-tuple SSt = (D, ν)where, (a)Dis aﬁnite subset ofD; (b)ν:D→ N maps the Dewey codes to their node name.

Evaluating an element constructor against an input simple store will add a tree into the input store. Consider the following XQuery expression(Q2)when given the input documentsource.xmlshown in Figure 3.

Q 2. let $v := doc("source.xml")/s return <t>$v/c,$v/a</t>

For an initial storeSSt0 = (D0, ν0)where,

• D0 ={r0, r0.1, r0.1.1, r0.1.2, r0.2, r0.3}wherer0 ∈ S;

• ν0(r0) =s, ν0(r0.1) =a, ν0(r0.1.1) =ν0(r0.1.2) =b,

ν0(r0.2) =ν0(r0.3) =cwhere{s,a,b,c} ⊂ N,

evaluating(Q2)updatesSSt0intoSSt2(D2, ν2)where,

• D2 =D0∪ {r1, r1.1, r1.2, r1.3, r1.3.1, r1.3.2}

(9)

• ν2=ν0+{r1 →t, r1.1→c, r1.2→c,

r1.3→a, r1.3.1→b, r1.3.2→ b}

wheret∈ N

This updating is achieved by the following steps in a recursive way for nested element constructors. (i) Generate a new root code r ∈ S for an element con-structor. (ii) Reassign Dewey codes for values produced by evaluated expressions occurring inside the element constructor. Note that once the data is serialized, the information about document order associated with nodes is lost.

2.2 Emulating Simple Store

In this subsection, we will show that static emulation of newly created XML frag-ments in simple store is achieved by using the extended Dewey Order encoding of expressions. The purpose of this encoding is to allow operation like sorting, axis access and duplicate elimination on expression rather than on the dynamic store.

When expressions contain element constructors, the semantics of XQuery re-quires; (1) a node generated by an element constructor is placed at the _ﬁrst po-sition of the document order de_ﬁned by the element constructor, (2) nodes in a sequence generated by expressions occurring inside the element constructor are copied deeply and placed following the node in (1) above with preserving the or-der in the sequenceHidor-ders et al. [2004]. This requirement leads to the following properties. Note that for an expressionewe useefor Dewey Order encoding of evaluated data against an arbitrary store(D, ν).

PROPERTY2.2. For an element constructor, ene/en, whereenis an element name andeis an expression,

(i) ene/en=rwherer ∈ S ∧r /∈D

(ii) ∀d∈e,d=<en>e</en>.n3wherenis an integer.

(iii) wheneis a sequence constructor(e1, e2),

∀d1 ∈e1∀d2 ∈e2, d1< d2

Figure 4 shows this property using concrete examples. This property enables us to statically emulate newly created XML fragments — created by element con-structors — in simple store. This emulation is achieved by Dewey encoding of expressions which exploits PROPERTY2.2.

In this paper, as will be seen in next section we extend Dewey code and its order by introducing new delimiter “#” to be suitable for the semantics of “for” expres-sions in XQuery. From now on to the end of this section, we will see the property of the “for” expressions occurring inside element constructors and describe the role of the new delimiter “#”. Figure 4 shows such a property of the “for” expression (Q3), bellow.

3

We use∈_{for sequence containment. And we treat an item identically to a sequence containing}

(10)

<a> {$v/c} </a>

a

c c ... c

<a> {($v/c,

$v/d)} </a>

a

c ... c d

[[$v/c]]

... d

[[$v/c]] [[$v/d]]

<a>

{for $v in e_b

return ($v/c,$u/d)} </a>

a

c ... c d ... d

[[v

1/c]] [[v1/d]]

c ... c d ... d

[[v

n/c]] [[vn/d]]

...

[[e_b]]=<v

1,...,vn>

where,

Figure 4: A simple example for the document order in element creations

Q 3. afor $v in eb return ($v/c,$v/d)/a

The semantics of a “for” expression is to evaluate the “return” expression k

times wherekis the length of the sequence, which is the result of the expression followed by “in”. So, for the ordered tree which is the result of(Q3), the child nodes of the root are the sequence of elements, which is the deep copied sequence of the result of evaluating($v/c,$v/d) zero or more times.

For the expressions (Q3)/dand(Q3)/cwe can get easily thevalue equiva-lentexpressions (Q4)and(Q5), respectively.

Q 4. for $u in eb return $v/d Q 5. for $u in eb return $v/c

Then, consider the expression(((Q4)),((Q5)))/self:: ∗. As described in the previous subsection, since axis access by “/” requires the sorting and duplicate elimination in document order, the correct transformation of this expression should result in(Q6), in which two “for” expressions(Q4)and(Q5)are merged, sort-ing expressions appeared in the “return” expression.

Q 6. for $u in eb return ($v/c,$v/d)

(11)

e ::= c constants

| $v variables

| (e, e, ..., e) sequence constructions

| e/α::en location step expressions

| for $v in e return e for-exp.

| let $v := e return e let-expressions

| ene/en element constructor Figure 5: XQuery

B ::= (n|?)X n∈ I

X ::= ǫ|.B |#[B, . . . , B]

D ::= B|ǫ|rX |# [D, . . . , D] r ∈ S

Figure 6: Abstract syntax of the extended Dewey code

expressions. As will be seen the next section, when we have a same pre_ﬁx until “#” delimiter for given two extended Dewey codes, the sorting and duplicate elimina-tion on this codes requires merging into one code to merge two “for” expressions into one “for” expression.

For example, the extended Dewey encoding of the “for” expression in(Q3)is

r.1#[1,2], where ris the extended Dewey encoding of(Q3). And the extended Dewey encoding of(Q4)and(Q5)arer.1#2andr.1#1, respectively.

3 Algorithm Overview

In this section, we brie_ﬂy overview our algorithm for automatic fusion of XQuery expressions so that unnecessary element constructions can be correctly eliminated. The purpose of this section is on conveying the essential idea we use. The detailed and precise algorithm will be given in Section 5. Basically, we will focus on fusing the following subexpression

e/α::en

so that unnecessary element construction in the query expression ineis eliminated under the context of “selection” byα::en.

Note that the XQuery fusion does not always succeed. We will describe the algorithm handling failure cases in Section 5.

3.1 Annotated XQuery Expressions

(12)

ed _::= _cd_|_$_vd

|(ed_{, e}d_{, ..., e}d₎d_|₍_ed_/α_::_en₎d

| (for $v in ed _return _ed₎d

| (let $v := ed _return _ed₎d

| (ened_/en₎d

Figure 7: Annotated XQuery

Γ⊢c→cǫ (PCST)

Γ(v).btype=”let”

Γ⊢$v→Γ(v).expr (PLVAR)

Γ(v).btype=”for”

Γ⊢$v→$v1 (PFVAR)

Γ⊢e1 →e′1d1 . . . Γ⊢eN →e′ndN

Γ⊢(e1, . . . , eN)→flatten(e′₁d1, . . . , e

′dN

N )[d 1,...,dN]

(PSEQ)

Γ⊢e1 →e′1d1 Γ∪ {v→(e

′d1

1 ,”let”)} ⊢e2 →e′2d2

Γ⊢let $v := e1 return e2 →e′2d2

(PLET)

Γ⊢e1 →e′₁d1 Γ∪ {v →(e₁′d1,”for”)} ⊢e2 →e′₂d2

Γ⊢for $v in e1 return e2→(for $v in e′

d₁

1 return e ′d₂

2 )#

d2 (PFOR)

e→e′d _e′′d′

=child fusione′d_en

Γ⊢e/child::en→e′′d′ (PCSTP)

=self fusione′d_en

Γ⊢e/self::en→e′′d′ (PSSTP)

=parent fusione′d_en

Γ⊢e/parent::en→e′′d′ (PPSTP)

Γ⊢e→e′d1 _d

2=new rootD e′d3 =dc assign e′ d2.1

Γ⊢ ene/en → ene′d3_/end2 (PELM) Figure 8: An Overview of Our XQuery Fusion

test which can be a tag name or∗ (an arbitrary tag), a “for” expression, a “let” expression, or an element construction expressionene/en.

(13)

code is given in Figure 6. Informally, we may consider it as

d::= ǫ|r.d|r#[d1, . . . , dm]

where ǫ denotes the unknown code. This code is assigned to both expressions which are occurred in outside of element constructors and ones which cannot be partial evaluated. Again note that the XQuery fusion does not always succeed. The algorithm handling the fail case is shown in Section 5.

The partial order on the extended Dewey codes are essentially the dictionary order. For example, r.1.2 < r.1.3, r.1 < r.1.2 hold. But the following pairs of codes are incomparable: (ǫ, r) is incomparable because ǫ is the unknown code; (r,r′_{) is incomparable if}_r₌_r′_{; and (}_r._1#[3]_,_r._1#[1_,_2]_{) is incomparable because}

they represent interleaved document orders of the elements produced by a “for” expression. Whereas, as will be seen later, sorting and duplicate elimination on [r.1#[3], r.1#[1,2]] results in r.2#[1,2,3] to merge two “for” expressions into one “for” expression because they have the same pre_ﬁx until #, say r.1. This operation will be appeared asremove duplicate (dc sort [r.1#[3], r.1#[1,2]]) in Section 3.2 and as[]∼D⊎

⊕D

≺D[r.1#[3], r.1#[1,2]]in Section 4.

Now we can add annotations of the extended Dewey codes to XQuery expres-sion as in Figure 7. We sometimes omit the annotation if it is clear from the context. To simplify our presentation, we will assume that there is a global environment for storing all annotated expressions during our fusion transformation, and a function

getExpGlobal(r)

that can be used to extract the expression whose code isrfrom the global environ-ment.

3.2 Fusion Transformation

Figure 8 summarizes our fusion transformation on XQuery expressions. This fu-sion is de_ﬁned in terms of a set of inference rules. In these rules, a judgment of the form4

Γ⊢e→ed

indicate that, for a given environment Γ (mapping XQuery variables bound by “let” or “for” to annotated expressions), the XQuery expression ecomplies into the annotated expressionsed_{. As will be seen later, the annotation is used to keep} track of information of the order and the context among expressions, and it plays an important role in our fusion transformation. When the fusion transformation is ﬁnished, we can ignore all the annotation and give a normal XQuery expression as the_ﬁnal result.

The de_ﬁnition of our fusion in Figure 8 is rather straightforward. For a con-stant expression c, we return itself but annotate it with the Dewey codeǫ. For a

(14)

dc assignc r=cr dc assign$v r= $vr

dc assign(e/c) r = (e/c)r (DCSTP)

dc assign(e1, . . . , en) r = (e1, . . . , en)[r1,...,rn]

wherer0 =r e′i =dc assigneiri−1 ri =succ(extract dce′i) (DCPSEQ)

dc assign(<t>e</t>) r =<t>e′</t>r

wheree′=dc assignei r.1 (DCPEC)

dc assign(for $v in e0 return e) r = (for $v in e0 return e′)r#

bs

wheree′ =dc assigne0 bs =extract dce′

(DCPFOR)

Figure 9: Dewey code propagation

variable, if it is bounded by the outside “let”, we retrieve its corresponding anno-tated expression from the environment, otherwise it must be a variable bound by the outside “for” and we put the extended Dewey code 1 to the variable when its corresponding expression has an extended Dewey code except forǫ. Otherwise we putǫto the variable bound by the outside “for”. Note that the reason why the anno-tation of the variable bound by “for” always is 1 is described later. for a sequence expression, we partially evaluate each element expression, and then group them to a new sequence annotated with a Dewey that are gathered from the result of each element expression. Note that we use flatten to remove nested sequences (e.g.,

flatten((er1

11, er

2

12)[r1,r2], er

3

3 )[[r1,r2],r3]= (er

1

11, er

2

12, er

3

3 )[r1,r2,r3]). For a location step

expression e/α::en , we perform fusion transformation to eliminate unnecessary element construction ineafter partially evaluating e. We will discuss the de_ﬁ ni-tions of the three important rules (PCSTP),(PSSTP) and (PPSTP) in Section 3.2.2. For a “let” expression, weﬁrst partially evaluate the expression e1, and then

par-tially evaluatee2with an updated environment and return it as the result. Note that

this rule eliminate variables bound by “let” by expanding variables usingΓ. For a “for” expression, we do similarly as for a “let” expression except that weﬁnally produce a new “for” expression by gluing partially evaluated results together. For an element construction, after partially evaluating its content expressionetoe′_{, we}

create a new Dewey code for annotating this element, and propagate this Dewey code information to all subexpression ine′ _{(with function} _{dc assign}_{) so that we}

can access (recover) this element constructor when processing subexpressions of

e′

(15)

3.2.1 Dewey Code Propagation

Propagating the Dewey code of an element construction to its subexpressions(content expressions) plays an important role in constructing our rules (Section 3.2.2) for correct fusion transformation.

Figure 9 de_ﬁnes a functiondc assign e r:

dc assign::XQueryD →D→XQueryD

which is to propagate the Dewey coderinto an annotated expressioneby assign-ing proper new Dewey codes toeand its subexpressions. We will explain some important equations in this de_ﬁnition. Note that we write e− _{to denote that the}

Dewey code ofeis “don’t care”.

The equation (DCPSEQ) places horizontal numbering to sequence expressions. Function succ is used to enforce numbering using strictly greater value relative to previously processed expressions (e.g., succ r.1 = r.2). (DCPEC) introduces vertical structure to the numbering by initiatingdc assignfor subexpression eby adding “.1” to its second parameter. The equations that needs additional attention is (DCSTP) and (DCPFOR) above. In (DCSTP), it may seem unusual fordc assign

not to recurse subexpressione. However, considering that path expression itself do not introduce additional parent-child relationship, and thatdc assignalways han-dle expressions that is already partially evaluated, there is no additional chance to simplify the path expression further using Dewey code allocated to the subexpres-sion. Particularly characteristic equation (DCPFOR), which introduces # structure to the Dewey code, numbers the expressioneat return clause. Note that the second parameter to recursive call foreis reset to 0.bsthat re_ﬂects the horizontal structure produced by the return clause is combined by the # sign to produce r#bs as the top level code allocated to the “for” expression.

3.2.2 Fusion Rules

Our fusion transformation one/α::enis based on the three fusion rules (functions) (PCSTP), (PSSTP) and (PPSTP) in Figure 10, which correspond to three types of axis. The basic procedure is as follows: (1) Extract (get) subexpressions accord-ing to the axis α; (2) Select those who produce nodes whose name is equal to the tag nameenusing aﬁlter; (3) Sort the remained subexpressions according to their Dewey codes; (4) If the above sort step succeeds, we remove the duplicated subexpressions and return its sequence as the result, otherwise we give up fusion.

More concretely, consider the de_ﬁnition ofchild fusion. We useget children e

to get a sequence of subexpressions that contribute to producing children of the XML document that can be obtained by evaluation ofe, and usefilter(equal toen) function to keep those that are equal toenwherefilter pxs = [x|x ← xs, p x]. The resulting sequence expression is sorted according to their Dewey codes by

(16)

element subexpressions (remove duplicate), otherwise we give up fusion by re-turning the original expressione/child::en. We will show the precise algorithm handling this fail case in Section 5.

3.2.3 Examples

We demonstrate our fusion transformation by using some examples. For readabil-ity, we use “/” for “child::” and “/..” for “parent::”.

First, Figure 11 shows how an example of our fusion transformation for

t($v/c,$v/a)/t/c shown in Section 1. Note that for a space limitation, we use (A), (B), (C) and (D) instead of (PCSTP), (PELM), (PSEQ) and (PFVAR), respectively in Figure 11. So, for t($v/c,$v/a)/t/c/.., which is also from Section 1, We can get the correct expression by using the following transforma-tion;

Γ⊢ t($v/c,$v/a)/t/c→($v/c)d1.1

t($v/c,$v/a)/td1 ₌_{parent fusion} _($_v/c₎d1.1_∗

Γ⊢ t($v/c,$v/a)/t/c/.. → t($v/c,$v/a)/td1 (PPSTP)

Next, consider the following expression(Q7),

Q 7. let $v := a()/a return ($v,$v)/self::a

In(Q7), the subexpression ($v,$v)/self :: a is redundant because duplicate elimination is needed for this subexpression. Figure 12 shows this transformation. For more complicated case, we show that our extended Dewey order encoding of “for” expressions occurring inside an element constructor requires to append “#” to the extended Dewey code. This is the prominent feature of our extension to the extended Deweys’ which is explained using(((Q4)),((Q5)))/self :: ∗

described in Section 2.2. Before we explain this query, consider (Q3)which is also described in Section 2.2. Partial evaluation of (Q3)assigns the extended Dewey code shown in Figure 13. So,((Q3)/d)is transformed in the way shown in Figure 14. Also,((Q3)/c)is transformed in the way shown in Figure 15. Now, return to((Q3)d/,(Q5)/c)/self::∗. With the above results, the partial evalu-ation performs shown in Figure 16

4 Properties on Extended Dewey Order

This section describes an algebraic structure of sorting and duplicate elimination in the extended Dewey Order.

Both sorting by dc sort and duplicate elimination byremove duplicatetake place at the same time. This is achieved on a sequence of Dewey codes[d1, d2, . . . , dn]

by

[]∼D⊎

⊕D

(17)

child fusion::XQueryD→QName→XQueryD

child fusion ed_en₌

remove duplicate (e′

1, ..., e′N) ifdc sort succeeds (ed_/

child::en)ǫ _otherwise

where(e′1, ..., e′N) =dc sort(filter(equal toen)(get children e))

(CFUSION)

self fusion::XQueryD→QName →XQueryD

self fusion ed_en₌

1, ..., e′N) ifdc sort succeeds (ed_/_self_::_en₎ǫ _otherwise

where(e′1, ..., e′N) =dc sort(filter(equal toen)(get self e)) (SFUSION)

parent fusion::XQueryD →QName→XQueryD

parent fusioneden=

1, ..., e′N) ifdc sort succeeds (ed_/

parent::en)ǫ _otherwise

where(e′

1, ..., e′N) =dc sort(filter(equal toen)(get parent e))

(PFUSION)

get children::XQueryD→XQueryD

get children c= ()[] get children$v = ($v/child::∗)ǫ

get children ()[]= ()[]

get children (e1, ..., eN) =flatten ((e′1, ..., e′N)

[d1,..,dN]₎

where e′

i=get children ei di=extract dc(e′i) (GCSEQ)

get children (e/child::en) = (e/child::en/child::∗)ǫ

get children(ened/en) =ed (GCEC)

get children (for $v in e return (e1, ..., eN))r#[b1,...,bN]

=

⎛

⎜ ⎜ ⎝

for $v in e return (e11, e12, . . . , e1n1, e21, e22, . . . , e2n2,

· · ·

eN1, eN2, . . . , eN nn) ⎞

⎟ ⎟ ⎠ r′

where (ei1, . . . , eini) =get childrenei rij =extract dce′ij

r′=r# [b1.r11, . . . , b1.r1n1, b2.r21, . . . , b2.r2n2, ...

bN.rN1, . . . , bN.rN nn]

(GCFOR)

get self,get parent :: XQueryD→XQueryD get self er₌_er

get parent er.(n|?)₌_getExpGlobal₍_r₎

(18)

Γ(v).btype=”for”

Γ⊢$v→$v1 (D)

($v/c)ǫ₌_cf_$_v1_c

Γ⊢$v/c→($v/c)ǫ (A)_{· · ·} Γ⊢($v/c,$v/a)

→($v/c,$v/a)[ǫ,ǫ]

(C)

d1=new rootD ($v/c,$v/a)d′

=da($v/c,$v/a)d1.1

d′ _{= [}_d1.₁_{, d1.}_2]

Γ⊢ t($v/c,$v/a)/t →(t($v/c,$v/a)/t)d1 (

B) cf(t($v/c,$v/a)/t)d1c

= ($v/c)d1.1

Γ⊢ t($v/c,$v/a)/t/c→($v/c)d1.1 (A)

where we usecfforchild fusionanddafordc assign.

Figure 11: Our fusion transformation fort($v/c,$v/a)/t/c.

d1=new rootD Γ⊢ a()/a → a()/ad1

(PELM)

Γ′₍_v₎_.expr

=a()/ad1

Γ′_⊢_$_v → a()/ad1

(PLVAR)

Γ′₍_v₎_.expr

=a()/ad1

Γ′_⊢_$_v → a()/ad1

(PLVAR)

Γ′ _⊢_($_v,_$_v₎_→₍_a₍₎_/a_,_a₍₎_/a₎[d1,d1]

(PSEQ)

Γ′ _⊢_($_v,_$_v₎_/_self_::_a_→_a₍₎_/ad1 (PS

STP)

Γ⊢let $v := a()/a return ($v,$v)/self::a→ a()/ad1 (PLET)

where we useΓ′_for_Γ_{∪ {}_v_→₍_a₍₎_/ad1_,_”let”₎_}_.

Figure 12: Our fusion transformation oflet $v := a()/a return ($v,$v)/self::

a.

where binary operator ∼D⊎

⊕D

≺D is deﬁned below. Compatibility test between the

members of a sequence of Dewey codes — failure of the test causes a failure of the partial evaluation (which is recovered at the caller of this operation by restoring the original expression) — and the uni_ﬁcation of two Dewey codes (possibly leads to uni_ﬁcation of twoforexpressions into one) are implemented usingorderableand

· · ·

Γ⊢for $v in eb return ($v/c,$v/d)

→(for $v in eb return ($v/c,$v/d))ǫ

(PFOR)

d1=new rootD

dc assign (for $v in eb return ($v/c,$v/d))d1.1

= (for $v in eb return ($v/c,$v/d))r1.1#[1,2] Γ⊢(Q3)→ afor $v in eb return ($v/c,$v/d)/ad1

(PELM)

(19)

...

Γ⊢(Q3)→ afor $v in eb return ($v/c,$v/d)/ad1

(PELM) (for $v in eb return $v/d)d1.1#2

=child fusion afor $v in eb return ($v/c,$v/d)/ad1 d Γ⊢(Q3)/d→(for $v in eb return $v/d)d1.1#2

(PCSTP)

Figure 14: Partial evaluation of(Q3)/d ...

Γ⊢(Q3)→ afor $v in eb return ($v/c,$v/d)/ad1

(PELM) (for $v in eb return $v/d)d1.1#1

=child fusion afor $v in eb return ($v/c,$v/d)/ad1 c Γ⊢(Q3)/c→(for $v in eb return $v/c)d1.1#1

(PCSTP)

Figure 15: Partial evaluation of(Q3)/c

⊕D, respectively.

DEFINITION 4.1 (distinctly ordered sequences). For a given sequence S =

y1, y2,· · ·, yn, S is distinctly ordered under when the following conditions

hold.

• All elements ofSare in a total order under, i.e.,

∀y, z, w∈S,

yz∧zy ⇒y ∼z (1)

yz∧zw⇒yw (2)

yz∨zy (3)

and

• ThatSis strictly monotonic which is deﬁned as the followings,

1. []is a strictly monotonic, and

...

Γ⊢(Q3)/d→ed1.1#2 1

(PCSTP) ...

Γ⊢(Q3)/c→ed1.1#1 2

(PCSTP)

Γ⊢((Q3)/d,(Q3)/c)→(ed1.1#2 1 , e

d1.1#1

2 )

(PSEQ)

(for $v in eb return ($v/c,$v/d))d1.1#[1,2] =self fusion (ed1.1#2

1 , e

d1.1#1

2 )[d1.1#2,d1.1#1]∗

Γ⊢((Q3)/d,(Q3)/c)/self::∗ →(for $v in eb return ($v/c,$v/d))d1.1#[1,2]

(PSSTP)

where we usee1forfor $v in eb return $v/dande2forfor $v in eb return $v/c

(20)

2. for a strictly monotonic sequence ys,y:ys is also strictly monotonic iff.

∀y′

∈ys(y≺y′

).

PROPERTY4.2. For a given distinctly ordered sequence y:ys, the following prop-erties hold byDEFINITION4.1.

(i) x:ys is a distinctly ordered wherex∼y.

(ii) x:y:ys is a distinctly ordered wherex≺y.

DEFINITION 4.3 (Preservation of order). Binary operator ⊕deﬁned over a total order set underpreserves the order if for any elementsy1, y2 in the total order set,

y1∼y2

(y1⊕y2)∼y1

(PRESO)

holds.

Ordered insertion (one to many)(∼⊳ ⊕ ≺)

DEFINITION 4.4 (Ordered insertion ∼⊳⊕≺). Binary operator∼⊳⊕≺ returns, for a

list on the left operand, a new list in whichy on the right operand is inserted by the following inference rules.

|y| →y′

([]∼⊳ ⊕

≺y)→[y ′

]

z∼y (z⊕y)→v

((z:zs)∼⊳⊕≺y)→v:zs

y ≺z

(z:zs)∼⊳ ⊕

≺y)→ (y:z:zs)

z≺y (zs∼⊳ ⊕

≺y)→zs’

((z:zs)∼⊳ ⊕

≺y)→z:zs’

THEOREM 4.5 (Ordered insertion). For any distinctly ordered sequenceS under

,S∼⊳

⊕

≺yis also distinctly ordered underwhere⊕satisﬁes (PRESO).

Proof. Induction on the sequenceSis used for the following cases;

(i) Sis[]:[]∼⊳⊕≺y, which is[y′]by theﬁrst deﬁnition of∼⊳⊕≺, is also distinctly

(21)

(ii) Sisz:zs: It is suf_ﬁcient to examine the following each case in binary relation betweenzandy, because bothzandyare in a total order set under.

(a) z∼y:S∼⊳⊕≺y, which is(z⊕y):zsby the second deﬁnition of∼⊳⊕≺,

is also distinctly ordered by PROPERTY4.2 (i) and (PRESO). (b) y≺z:S∼⊳

⊕

≺y, which isy:z:zs by the third deﬁnition of∼⊳ ⊕ ≺, is also

distinctly ordered by PROPERTY4.2 (ii).

(c) z ≁ y∧y ⊀ z: This implies z ≺ y by totality(3). So, the fourth deﬁnition is applied. The fourth deﬁnition is rewritten as follows;

z1 ≺y ((z2:zs)∼⊳⊕≺y)→z′:zs′

((z1:z2:zs)∼⊳ ⊕

≺y)→(z1:z′:zs)

In the above inference rule,(z′_:_zs₎_{is distinctly ordered by the inductive}

hypothesis. So to see that(z1:z′:zs) is distinctly ordered, we have to

showz1≺z′because of PROPERTY4.2 (ii). Now, the following cases

in relation betweenyandz2are examined.

i. y z2: By the second and the third deﬁnition of ∼⊳⊕≺,z′ ∼ y

holds. So,(z1 ≺y∧z′ ∼y)impliesz1 ≺z′ because ofz1, y, z′

in a total order set under.

ii. z2 ≺ y: By the fourth deﬁnition of ∼⊳⊕≺, z ′ _∼

z2 holds. So,

(z′

∼z2∧z1 ≺z2)impliesz1 ≺z′.

Ordered union (many to many)(∼⊎⊕≺)

DEFINITION 4.6 (Ordered union (many to many)). For sequences in which all elements are in a total order underwhere⊕satisﬁes (PRESO), binary operator

∼⊎⊕≺is deﬁned as the following inference rules.

(zs∼⊎⊕

≺[])→zs

(zs ∼⊳⊕≺y)→zs′ zs′∼⊎ ⊕

≺ys→vs zs ∼⊎⊕

≺(y:ys)→vs

THEOREM4.7 (Ordered union). For any distinctly ordered sequenceS1under,

(S1∼⊎⊕≺S2)is also distinctly ordered underwhere⊕satisﬁes (PRESO). Proof. Induction on the sequenceS2is used for the following cases;

(i) S2 is []: (S1 ∼⊎⊕≺ []), which is S1 by the ﬁrst deﬁnition of ∼⊎⊕≺, is also

(22)

(ii) S2 isy:ys: For any distinctly ordered sequenceS1 under,(S1 ∼⊎⊕≺ys)

is also distinctly ordered sequence under by the inductive hypothesis. By THEOREM 4.5, (S1 ∼⊳

⊕

≺ y) is a distinctly ordered sequence. So,

(S1 ∼⊎⊕≺ y:ys) is a distinctly ordered sequence by the second deﬁnition

of∼⊎⊕ ≺.

Now, we can de_fine the three important operators used in ordered union onD, strict partial order≺D, uni_fiable∼Dand uni_fication⊕D.

DEFINITION4.8 (Strict Partial Order onD(≺) ). The strict partial order onDis deﬁned as follows;

r1 =r2 x1≺xx2

r1 x1 ≺D r2x2

(x=ǫ)∨(x =.?x′₎ ǫ≺X x

b1 ≺Bb2

.b1≺X .b2

(n1 < n2)∨(n1=n2∧x1 ≺X x2)

n1x1 ≺Bn2x2

DEFINITION4.9 (Uni_ﬁable onD(∼)). The binary operator∼onDis deﬁned as follows;

r1 =r2 x1∼X x2

r1 x1 ∼D r2x2

n1 =n2 x1 ∼X x2

n1x1 ∼Bn2x2

ǫ∼X ǫ

b1 ∼Bb2

.b1∼X .b2

(23)

DEFINITION4.10 (Uni_ﬁcation onD(⊕)). The binary operator⊕onDis deﬁned as follows;

r1x1 ∼D r2x2 (x1⊕Xx2)→x

(r1x1⊕Dr2x2)→r1x

n1x1∼B n2x2 (x1⊕X x2)→x

(n1x1⊕Bn2x2)→n1x

(ǫ⊕X ǫ)→ǫ

.b1 ∼X .b2 (b1⊕Bb2)→b

(.b1⊕X.b2)→.b

#bs1 ∼X #bs2 []∼B⊎

⊕B

≺B (bs1++bs2)→bs

(#bs1⊕X #bs2)→#bs

DEFINITION4.11 (Orderable on a sequence of the extended Dewey code). For a sequence of the extended Dewey code ds, unary predicateorderableis deﬁned as the following inference rule.

allord ds

orderable ds

where

allord[]

allord d:[]

∀d′

∈ds,ord d d′

allord d:ds

((d1 ≺d2)∨(d2≺d1)∨(d1∼d2))

ord d1d2

5 The Revised Algorithm

(24)

5.1 Handling failure cases

To propagate success or failure of our fusion, we extend the return values of the three kinds of fusion functions by adding boolean values which indicate the status ofdc sort. Ifdc sortsucceeds every fusion function returns withtrue, otherwise

withfalse.

child fusion::XQueryD →QName →(Bool,XQueryD)

self fusion::XQueryD →QName→(Bool,XQueryD)

parent fusion::XQueryD →QName →(Bool,XQueryD)

Figure 17 shows the revised version of our fusion transformation on XQuery ex-pressions handling the failure cases. Now, the form of the judgement is changed into

Γ⊢e→(true/false, ed)

which indicates that for a given environmentΓ(mapping XQuery variables bound by “let” or “for” to annotated expressions), the XQuery expressionecompiles into the annotated expressions ed _{with Boolean value which indicates that the fusion} succeeds (true) or not (false). This Boolean values play an important role in fusing “let” and “for” expressions. For both a constant expression and a variable, our fusion functions return with true. For a sequence expression, when one or more subexpressions failed, there are two candidates; (1) recovering all the ele-ment expressions, including ones fused successfully, back to input expressions; or (2) doing nothing, namely there are fused expressions and nonfused expressions mixed together because when thedc sort fails the input expressions are returned with annotationǫ. We chose (2) because for a sequence expressions as a top-level expression, (2) is more ef_ﬁcient than (1). This choice makes the treatment of “let” and “for” expressions in our fusion be intricate. For a “let” expression, when the fusion ofe1failed, fusinge2 should be performed under an updated environment

with not the result of fusinge1 but the inpute1 with annotating ǫbecause of our

choice of (2) in fusing sequence expressions. We use a trinary operator,

b?e1 : e2

for a Boolean value b, two expressionse1 and e2. This operator resultse1 when

b istrue and results e2 when bis false. For a “for” expression, we do

(25)

Γ⊢c→(true,cǫ₎ (PCST’) Γ(v).btype=”let”

Γ⊢$v→(true,Γ(v).expr) (PLVAR’) Γ(v).btype=”for”

Γ⊢$v→(true,$v1₎ (PFVAR’)

Γ⊢e1→(b1, e′d1

1 ) . . . Γ⊢eN →(bN, e′ndN) e′′ d′

=flatten ((e′d1

1 , . . . , e′ndN)[

d1,..,dN]₎

Γ⊢(e1, . . . , eN)→(b1∧. . .∧bN, e′′d

′

)

(PSEQ’)

Γ⊢e1→(b1, e′d1

1 ) Γ∪ {v→(b1?e ′d1

1 : eǫ1)} ⊢e2→(b2, e ′d2 2 )

Γ⊢let $v := e1 return e2→(true,(b2?e′d2

2 : let $v :=e1 return e2)

(PLET’)

Γ⊢e1→(b1, e′d1

1 ) Γ∪ {v→(b1?e′

d1

1 : eǫ1)} ⊢e2→(b2, e′

d2 2 )

Γ⊢for $vin e1 return e2

→(true,(for $vin (b1?e

′d1

1 : e

ǫ

1) return (b2?e

′d2

2 : e

ǫ

2))(

b2? #d2:ǫ)₎

(PFOR’)

Γ⊢e→(b1, e′d_{) (}_{b2, e}′′d′

) =child fusione′d_en Γ⊢e/child::en→(b2, e′′d′

) (PCSTP’)

Γ⊢e→(b1, e′d_{) (}_{b2, e}′′d′

) =self fusione′d_en Γ⊢e/self::en→(b2, e′′d′

) (PSSTP’)

Γ⊢e→(b1, e′d₎ ₍_{b2, e}′′d′

) =parent fusione′d_en Γ⊢e/parent::en→(b2, e′′d′

) (PPSTP’)

Γ⊢e→(b1, e′d1₎ _d2₌

new rootD e′d3₌

dc assign e′_d2.₁

Γ⊢ ene/en →(true,ene′d3/end2) (PELM’)

Figure 17: XQuery fusion handling failure

6 Experimental Results

6.1 System overview

We have implemented a prototype system in Objective Caml. It consists of about 4600 lines of code. Although the framework has been represented using simple function de_ﬁnitions, actual implementation uses more complex structure to achieve static emulation of the store more precisely. Main enhancements in the actual implementation are:

• achieving both sorting and duplicate elimination in the extended Dewey or-der simultaneously using one higher-oror-der function exploiting the algebraic structure shown in Section 4.

• keeping track of the success or failure of the partial evaluation in order to recover original expression when subexpression fails to simplify.

• maintaining the global environment for storing all annotated expressions dur-ing our fusion transformation as 4-ary relation of(e, o, c, d)where,

(26)

s a b ... 100/1000 ... b

b b b

100/1000

c

d ... d

10000 r1 ... a b b b

b ... b

100/1000 100/1000 r2 ... a b b b

b ... b

100/1000 100/1000

1 step

Figure 18: Schema mappings in Qa

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

1 5 10 15 20 25 30

ti me [s] steps N(100) R+O(100) O(100) N(1000) R+O(1000) O(1000) 0 2 4 6 8 10 12 14 16 18 20

1 5 10 15 20 25 30

ti me [s] steps N(10) R+O(10) O(10) N(100) R+O(100) O(100)

Figure 19: Times forQa(left) andQb(right).

– since annotations for the extended Deweys’ are associated with nodes of abstract syntax trees, the node-idoneeds to be maintained,

– cdenotes a context of an expressione. For example, wheneoccurs in areturn expression of afor expression, we need to keep this context as Deweys’ pre_ﬁx.

– ddenotes a Dewey code ofe.

Our fusion algorithm adds information for annotated XQuery to this relation. The functiongetExpGlobal(r)is implemented by using this relation. Currently it works stand-alone reading XQuery expression from standard input and produces rewritten XQuery to standard output.

6.2 Settings and Results

While actual evaluation numbers are predictable, for example from Manolescu et al. [2006], we have tested two kinds of queriesQa(n)andQb(n)usingGalax5 version 1.0 on 2.6GHz Intel Core2 Duo with 4GB RAM, running MacOS 10.5.6. Both queries are synthetic for XML data integration systems with n steps as schema mappings inspired by Tatarinov and Halevy [2004].

Qa(n), which is for a document “d1.xml” is shown in Figure 20. In the “d1.xml”, the root nodeshas three child nodesa,bandcshown as the left-most

5

(27)

let $r1:=<r1>{let$s:=doc(“d1.xml”)/s

return(<a>{$s/b/b}</a>, {$s/a/b})}</r1>

return

let $r2:=<r2>{(<a>{$r1/b/b}</a>, {$r1/a/b})}</r2>

return

...

let $rn:=<rn>{(<a>{$r(n-1)/b/b}</a>, {$r(n-1)/a/b})}</rn>

return

let$v:=($rn/b,$rn/a)

return$v/b

Figure 20: Test queries Qa(n)

tree in Figure 18. We prepared two documents, in which the number ofbelements at level 3 under thea and belements at level 2 (where the root is at level 1) is 100/1000. InQa, each step of schema mapping swapsbelements at level 3 under aelement with ones underbelement. This mapping is shown in Figure 18. The left graph in Figure 19 shows the execution times for naive queries(N(100) and N(1000), optimized queries(O(100) and O(1000)) and query rewriting costs plus optimized queries(R+O(100) and R+O(1000)) for two documents, respectively. Note that since naive queries produce redundant intermediate results in propor-tional to the number of steps, the execution times are linear with respect to steps. Whereas, since optimized queries rewritten by our prototype system always de-generate to queries to the extensional DB only, the execution time remain constant. Thisﬁgure shows that the rewriting costs are not neglectable when the steps are increased. Most of this overhead comes from the global environment that is kept in memory. Since the global environment is only used in solving reverse axis, it can be safely discarded when input queries include forward axis only. This optimiza-tion will be incorporated in the next version of our prototype system. For an even number of steps, our prototype system rewritesQa(n) into the optimized query, (doc(“d1.xml”)/s/a/b,doc(“d1.xml”)/s/b/b,()).

Qb(n), which is for a document “d2.xml” is shown in Figure 21

(28)

let $r1:=<r1>{for$t1indoc(“d2.xml”)/s/t

return<t>{(<a>{$t1/b/b}</a>, {$t1/a/b})}</t>}

</r1>

return

let $r2:=<r2>{for$t2in$r1/t

return<t>{(<a>{$t2/b/b}</a>, {$t2/a/b})}</t>}

</r2>

return

...

let$rn:=<rn>{for$tnin$r(n-1)/t

return<t>{(<a>{$tn/b/b}</a>, {$tn/a/b})}</t>}

</rn>

return

let$v:=($rn/t/b,$rn/t/a)

return$v/b

Figure 21: Test queries Qb(n)

s a b ... 10/100 ... b

b b b

10/100 t a b ... 10/100 ... b

b b b

10/100 t r1 t b b ... 10/100 ... a b b b 10/100 t

...

... a b b 10/100 b b ... 10/100 b 1 step 10/100

...

10/100

...

Figure 22: Schema mappings in Qb

inQa. Compared toQa, R+Oalways outperformN showing that eliminated re-dundant evaluation costs of intermediate results that are (repeatedly) created by the “for” expressions always exceeded the rewriting overhead. For an odd number of steps, our prototype system rewritesQb(n)into the following optimized query;

for$t1indoc(“d2.xml”)/s/treturn($t1/b/b,($t1/a/b,()))

7 Conclusion

(29)

its static emulation of XML store and assignment of extended Deweys’ to the expressions, which leads to easy construction of correct fusion transformation.

References

W.N. Chin. Safe fusion of functional expressions. InProc. Conference on Lisp and Functional Programming, pages 11–20, San Francisco, California, June 1992. Leonidas Fegaras and David Maier. Optimizing object queries using an effective

calculus. ACM Trans. Database Syst., 25(4):457–516, 2000. ISSN 0362-5915. doi: http://doi.acm.org/10.1145/377674.377676.

Bilel Gueni, Talel Abdessalem, Bogdan Cautis, and Emmanuel Waller. Pruning Nested XQuery Queries. InCIKM, pages 541–550, 2008.

Jan Hidders, Jan Paredaens, Roel Vercammen, and Serge Demeyer. A Light but Formal Introduction to XQuery. InSecond International XML Database Sym-posium,(XSym2004), pages 5–20, 2004.

Christoph Koch. On the role of composition in XQuery. InProceedings of Eighth International Workshop on the Web and Databases (WebDB 2005), 2005. Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, and Ting Chen. From Region

Encoding To Extended Dewey: On Efﬁcient Processing of XML Twig pattern Matching. InProc of VLDB, 2005.

Ioana Manolescu, Cedric Miachon, and Philippe Michiels. Towards micro-benchmarking xquery. In Proceedings of the First International Workshop on Performance and Evaluation of Data Management Systems (EXPDB 2006), 2006.

Jim Melton. Writing Wrongs: How Not To Build A Standard. In XIME-P (Keynote), 2008.

Wim Le Page, Jan Hidders, Philippe Michiels, Jan Paredaens, and Roel Vercam-men. On the expressive power of node construction in XQuery. InProceedings of Eighth International Workshop on the Web and Databases (WebDB 2005), 2005.

Igor Tatarinov and Alon Halevy. Ef_ﬁcient Query Reformulation in Peer Data Man-agement Systems. InProceedings of the ACM International Conference on Man-agement of Data, pages 539–550, 2004.

(30)

P. Wadler. Deforestation: Transforming programs to eliminate trees. InProc. ESOP (LNCS 300), pages 344–358, 1988.